Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
ArXiv ; 2024 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-38463508

RESUMEN

Accurate blind docking has the potential to lead to new biological breakthroughs, but for this promise to be realized, docking methods must generalize well across the proteome. Existing benchmarks, however, fail to rigorously assess generalizability. Therefore, we develop DockGen, a new benchmark based on the ligand-binding domains of proteins, and we show that existing machine learning-based docking models have very weak generalization abilities. We carefully analyze the scaling laws of ML-based docking and show that, by scaling data and model size, as well as integrating synthetic data strategies, we are able to significantly increase the generalization capacity and set new state-of-the-art performance across benchmarks. Further, we propose Confidence Bootstrapping, a new training paradigm that solely relies on the interaction between diffusion and confidence models and exploits the multi-resolution generation process of diffusion models. We demonstrate that Confidence Bootstrapping significantly improves the ability of ML-based docking methods to dock to unseen protein classes, edging closer to accurate and generalizable blind docking methods.

2.
ArXiv ; 2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38259348

RESUMEN

Protein design often begins with knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a diverse range of motifs. However, the generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow, and requires no additional training. Both approaches achieve an equivalent or higher success rate than previous state-of-the-art methods, with 2.5 times more structurally diverse scaffolds. Code: https://github.com/microsoft/frame-flow.

3.
ArXiv ; 2023 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-38076509

RESUMEN

High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.

4.
Science ; 382(6677): eadi1407, 2023 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-38127734

RESUMEN

A closed-loop, autonomous molecular discovery platform driven by integrated machine learning tools was developed to accelerate the design of molecules with desired properties. We demonstrated two case studies on dye-like molecules, targeting absorption wavelength, lipophilicity, and photooxidative stability. In the first study, the platform experimentally realized 294 unreported molecules across three automatic iterations of molecular design-make-test-analyze cycles while exploring the structure-function space of four rarely reported scaffolds. In each iteration, the property prediction models that guided exploration learned the structure-property space of diverse scaffold derivatives, which were realized with multistep syntheses and a variety of reactions. The second study exploited property models trained on the explored chemical space and previously reported molecules to discover nine top-performing molecules within a lightly explored structure-property space.

5.
ArXiv ; 2023 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-38106455

RESUMEN

Molecular docking is critical to structure-based virtual screening, yet the throughput of such workflows is limited by the expensive optimization of scoring functions involved in most docking algorithms. We explore how machine learning can accelerate this process by learning a scoring function with a functional form that allows for more rapid optimization. Specifically, we define the scoring function to be the cross-correlation of multi-channel ligand and protein scalar fields parameterized by equivariant graph neural networks, enabling rapid optimization over rigid-body degrees of freedom with fast Fourier transforms. The runtime of our approach can be amortized at several levels of abstraction, and is particularly favorable for virtual screening settings with a common binding pocket. We benchmark our scoring functions on two simplified docking-related tasks: decoy pose scoring and rigid conformer docking. Our method attains similar but faster performance on crystal structures compared to the widely-used Vina and Gnina scoring functions, and is more robust on computationally predicted structures. Code is available at https://github.com/bjing2016/scalar-fields.

6.
Nature ; 620(7976): 1089-1100, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37433327

RESUMEN

There has been considerable recent progress in designing new proteins using deep-learning methods1-9. Despite this progress, a general deep-learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher-order symmetric architectures, has yet to be described. Diffusion models10,11 have had considerable success in image and language generative modelling but limited success when applied to protein modelling, probably due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of designed symmetric assemblies, metal-binding proteins and protein binders. The accuracy of RFdiffusion is confirmed by the cryogenic electron microscopy structure of a designed binder in complex with influenza haemagglutinin that is nearly identical to the design model. In a manner analogous to networks that produce images from user-specified inputs, RFdiffusion enables the design of diverse functional proteins from simple molecular specifications.


Asunto(s)
Aprendizaje Profundo , Proteínas , Dominio Catalítico , Microscopía por Crioelectrón , Glicoproteínas Hemaglutininas del Virus de la Influenza/química , Glicoproteínas Hemaglutininas del Virus de la Influenza/metabolismo , Glicoproteínas Hemaglutininas del Virus de la Influenza/ultraestructura , Unión Proteica , Proteínas/química , Proteínas/metabolismo , Proteínas/ultraestructura
7.
Nat Chem Biol ; 19(11): 1342-1350, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37231267

RESUMEN

Acinetobacter baumannii is a nosocomial Gram-negative pathogen that often displays multidrug resistance. Discovering new antibiotics against A. baumannii has proven challenging through conventional screening approaches. Fortunately, machine learning methods allow for the rapid exploration of chemical space, increasing the probability of discovering new antibacterial molecules. Here we screened ~7,500 molecules for those that inhibited the growth of A. baumannii in vitro. We trained a neural network with this growth inhibition dataset and performed in silico predictions for structurally new molecules with activity against A. baumannii. Through this approach, we discovered abaucin, an antibacterial compound with narrow-spectrum activity against A. baumannii. Further investigations revealed that abaucin perturbs lipoprotein trafficking through a mechanism involving LolE. Moreover, abaucin could control an A. baumannii infection in a mouse wound model. This work highlights the utility of machine learning in antibiotic discovery and describes a promising lead with targeted activity against a challenging Gram-negative pathogen.


Asunto(s)
Acinetobacter baumannii , Aprendizaje Profundo , Animales , Ratones , Antibacterianos/farmacología , Farmacorresistencia Bacteriana Múltiple , Pruebas de Sensibilidad Microbiana
8.
ArXiv ; 2023 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-37064532

RESUMEN

Protein structure prediction has reached revolutionary levels of accuracy on single structures, yet distributional modeling paradigms are needed to capture the conformational ensembles and flexibility that underlie biological function. Towards this goal, we develop EigenFold, a diffusion generative modeling framework for sampling a distribution of structures from a given protein sequence. We define a diffusion process that models the structure as a system of harmonic oscillators and which naturally induces a cascading-resolution generative process along the eigenmodes of the system. On recent CAMEO targets, EigenFold achieves a median TMScore of 0.84, while providing a more comprehensive picture of model uncertainty via the ensemble of sampled structures relative to existing methods. We then assess EigenFold's ability to model and predict conformational heterogeneity for fold-switching proteins and ligand-induced conformational change. Code is available at https://github.com/bjing2016/EigenFold.

9.
Mol Syst Biol ; 18(9): e11081, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-36065847

RESUMEN

Efficient identification of drug mechanisms of action remains a challenge. Computational docking approaches have been widely used to predict drug binding targets; yet, such approaches depend on existing protein structures, and accurate structural predictions have only recently become available from AlphaFold2. Here, we combine AlphaFold2 with molecular docking simulations to predict protein-ligand interactions between 296 proteins spanning Escherichia coli's essential proteome, and 218 active antibacterial compounds and 100 inactive compounds, respectively, pointing to widespread compound and protein promiscuity. We benchmark model performance by measuring enzymatic activity for 12 essential proteins treated with each antibacterial compound. We confirm extensive promiscuity, but find that the average area under the receiver operating characteristic curve (auROC) is 0.48, indicating weak model performance. We demonstrate that rescoring of docking poses using machine learning-based approaches improves model performance, resulting in average auROCs as large as 0.63, and that ensembles of rescoring functions improve prediction accuracy and the ratio of true-positive rate to false-positive rate. This work indicates that advances in modeling protein-ligand interactions, particularly using machine learning-based approaches, are needed to better harness AlphaFold2 for drug discovery.


Asunto(s)
Antibacterianos , Benchmarking , Antibacterianos/farmacología , Ligandos , Simulación del Acoplamiento Molecular , Unión Proteica , Proteínas/metabolismo
10.
Proc Natl Acad Sci U S A ; 118(39)2021 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-34526388

RESUMEN

Effective treatments for COVID-19 are urgently needed. However, discovering single-agent therapies with activity against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been challenging. Combination therapies play an important role in antiviral therapies, due to their improved efficacy and reduced toxicity. Recent approaches have applied deep learning to identify synergistic drug combinations for diseases with vast preexisting datasets, but these are not applicable to new diseases with limited combination data, such as COVID-19. Given that drug synergy often occurs through inhibition of discrete biological targets, here we propose a neural network architecture that jointly learns drug-target interaction and drug-drug synergy. The model consists of two parts: a drug-target interaction module and a target-disease association module. This design enables the model to utilize drug-target interaction data and single-agent antiviral activity data, in addition to available drug-drug combination datasets, which may be small in nature. By incorporating additional biological information, our model performs significantly better in synergy prediction accuracy than previous methods with limited drug combination training data. We empirically validated our model predictions and discovered two drug combinations, remdesivir and reserpine as well as remdesivir and IQ-1S, which display strong antiviral SARS-CoV-2 synergy in vitro. Our approach, which was applied here to address the urgent threat of COVID-19, can be readily extended to other diseases for which a dearth of chemical-chemical combination data exists.


Asunto(s)
Antivirales/farmacología , Tratamiento Farmacológico de COVID-19 , Aprendizaje Profundo , Adenosina Monofosfato/análogos & derivados , Alanina/análogos & derivados , Supervivencia Celular/efectos de los fármacos , Combinación de Medicamentos , Interacciones Farmacológicas , Sinergismo Farmacológico , Humanos , SARS-CoV-2
12.
J Med Chem ; 63(16): 8667-8682, 2020 08 27.
Artículo en Inglés | MEDLINE | ID: mdl-32243158

RESUMEN

Artificial intelligence and machine learning have demonstrated their potential role in predictive chemistry and synthetic planning of small molecules; there are at least a few reports of companies employing in silico synthetic planning into their overall approach to accessing target molecules. A data-driven synthesis planning program is one component being developed and evaluated by the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS) consortium, comprising MIT and 13 chemical and pharmaceutical company members. Together, we wrote this perspective to share how we think predictive models can be integrated into medicinal chemistry synthesis workflows, how they are currently used within MLPDS member companies, and the outlook for this field.


Asunto(s)
Técnicas de Química Sintética/métodos , Química Farmacéutica/métodos , Aprendizaje Automático , Industria Química/métodos , Descubrimiento de Drogas/métodos , Modelos Químicos , Investigación Farmacéutica/métodos
13.
Cell ; 180(4): 688-702.e13, 2020 02 20.
Artículo en Inglés | MEDLINE | ID: mdl-32084340

RESUMEN

Due to the rapid emergence of antibiotic-resistant bacteria, there is a growing need to discover new antibiotics. To address this challenge, we trained a deep neural network capable of predicting molecules with antibacterial activity. We performed predictions on multiple chemical libraries and discovered a molecule from the Drug Repurposing Hub-halicin-that is structurally divergent from conventional antibiotics and displays bactericidal activity against a wide phylogenetic spectrum of pathogens including Mycobacterium tuberculosis and carbapenem-resistant Enterobacteriaceae. Halicin also effectively treated Clostridioides difficile and pan-resistant Acinetobacter baumannii infections in murine models. Additionally, from a discrete set of 23 empirically tested predictions from >107 million molecules curated from the ZINC15 database, our model identified eight antibacterial compounds that are structurally distant from known antibiotics. This work highlights the utility of deep learning approaches to expand our antibiotic arsenal through the discovery of structurally distinct antibacterial molecules.


Asunto(s)
Antibacterianos/farmacología , Descubrimiento de Drogas/métodos , Aprendizaje Automático , Tiadiazoles/farmacología , Acinetobacter baumannii/efectos de los fármacos , Animales , Antibacterianos/química , Quimioinformática/métodos , Clostridioides difficile/efectos de los fármacos , Bases de Datos de Compuestos Químicos , Ratones , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Mycobacterium tuberculosis/efectos de los fármacos , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Tiadiazoles/química
15.
J Chem Inf Model ; 59(8): 3370-3388, 2019 08 26.
Artículo en Inglés | MEDLINE | ID: mdl-31361484

RESUMEN

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.


Asunto(s)
Redes Neurales de la Computación , Gráficos por Computador
16.
Chem Sci ; 10(2): 370-377, 2019 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-30746086

RESUMEN

We present a supervised learning approach to predict the products of organic reactions given their reactants, reagents, and solvent(s). The prediction task is factored into two stages comparable to manual expert approaches: considering possible sites of reactivity and evaluating their relative likelihoods. By training on hundreds of thousands of reaction precedents covering a broad range of reaction types from the patent literature, the neural model makes informed predictions of chemical reactivity. The model predicts the major product correctly over 85% of the time requiring around 100 ms per example, a significantly higher accuracy than achieved by previous machine learning approaches, and performs on par with expert chemists with years of formal training. We gain additional insight into predictions via the design of the neural model, revealing an understanding of chemistry qualitatively consistent with manual approaches.

17.
J Am Stat Assoc ; 113(523): 1296-1310, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30906084

RESUMEN

We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the sequential-progression effects, our approach leverages an assumption that these effects follow a persistent trend. This work is motivated by the recent rise of single-cell RNA-sequencing experiments over a brief time course, which aim to identify genes relevant to the progression of a particular biological process across diverse cell populations. While classical statistical tools focus on scalar-response regression or order-agnostic differences between distributions, it is desirable in this setting to consider both the full distributions as well as the structure imposed by their ordering. We introduce a new regression model for ordinal covariates where responses are univariate distributions and the underlying relationship reflects consistent changes in the distributions over increasing levels of the covariate. This concept is formalized as a trend in distributions, which we define as an evolution that is linear under the Wasserstein metric. Implemented via a fast alternating projections algorithm, our method exhibits numerous strengths in simulations and analyses of single-cell gene expression data.

18.
J Chem Inf Model ; 57(8): 1757-1772, 2017 08 28.
Artículo en Inglés | MEDLINE | ID: mdl-28696688

RESUMEN

The task of learning an expressive molecular representation is central to developing quantitative structure-activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed.


Asunto(s)
Gráficos por Computador , Informática/métodos , Redes Neurales de la Computación , Fenómenos Físicos , Octanoles/química , Solubilidad , Pruebas de Toxicidad , Temperatura de Transición , Agua/química
19.
ACS Cent Sci ; 3(5): 434-443, 2017 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-28573205

RESUMEN

Computer assistance in synthesis design has existed for over 40 years, yet retrosynthesis planning software has struggled to achieve widespread adoption. One critical challenge in developing high-quality pathway suggestions is that proposed reaction steps often fail when attempted in the laboratory, despite initially seeming viable. The true measure of success for any synthesis program is whether the predicted outcome matches what is observed experimentally. We report a model framework for anticipating reaction outcomes that combines the traditional use of reaction templates with the flexibility in pattern recognition afforded by neural networks. Using 15 000 experimental reaction records from granted United States patents, a model is trained to select the major (recorded) product by ranking a self-generated list of candidates where one candidate is known to be the major product. Candidate reactions are represented using a unique edit-based representation that emphasizes the fundamental transformation from reactants to products, rather than the constituent molecules' overall structures. In a 5-fold cross-validation, the trained model assigns the major product rank 1 in 71.8% of cases, rank ≤3 in 86.7% of cases, and rank ≤5 in 90.8% of cases.

20.
Genome Res ; 26(10): 1430-1440, 2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-27456004

RESUMEN

Enhancers and promoters commonly occur in accessible chromatin characterized by depleted nucleosome contact; however, it is unclear how chromatin accessibility is governed. We show that log-additive cis-acting DNA sequence features can predict chromatin accessibility at high spatial resolution. We develop a new type of high-dimensional machine learning model, the Synergistic Chromatin Model (SCM), which when trained with DNase-seq data for a cell type is capable of predicting expected read counts of genome-wide chromatin accessibility at every base from DNA sequence alone, with the highest accuracy at hypersensitive sites shared across cell types. We confirm that a SCM accurately predicts chromatin accessibility for thousands of synthetic DNA sequences using a novel CRISPR-based method of highly efficient site-specific DNA library integration. SCMs are directly interpretable and reveal that a logic based on local, nonspecific synergistic effects, largely among pioneer TFs, is sufficient to predict a large fraction of cellular chromatin accessibility in a wide variety of cell types.


Asunto(s)
Ensamble y Desensamble de Cromatina , Cromatina/genética , Modelos Genéticos , Animales , Cromatina/metabolismo , Genoma Humano , Humanos , Aprendizaje Automático
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...